pglog : a flexible log analyser

PGConf US - New York
2021-12-03

Who am I?

Image by Anemone123 from Pixabay

Pglog

  • Why?
  • How?
  • What?
Image by xresch from Pixabay

Why reinvent the wheel?

  • Existing tools were frustrating
    • Working on a subset of data is painful
    • Only provide "standards" stats
Image by Andy Giraud from Pixabay

Why reinvent the wheel?

  • Examples:
    • Production peak analysis
    • In-depth analysis of temp file usage
Image by Andy Giraud from Pixabay
If only I could use SQL to query my logs...

How to install it?

  • Never in production
  • Simple extension Everything you need is in the ReadMe
  • Dockerfile option
Image by Harry Strauss from Pixabay

Prerequisites

  • A computer
  • PostgreSQL version 11 or newer
  • pglog project code
  • PostgreSQL logs to analyze (from PostgreSQL version 9.0)
Image by Harry Strauss from Pixabay

Or

  • Computer
  • pglog's Dockerfile
  • Docker desktop
  • PostgreSQL logs to analyze (from PostgreSQL version 9.0)
Image by Harry Strauss from Pixabay

About logs

  • csv formatted logs
  • with enough details
log_destination='csvlog'
logging_collector=on
log_min_duration_statement=250ms
log_autovacuum_min_duration=0
log_checkpoints=on
log_connections=on
log_disconnections=on
log_lock_waits=on
log_temp_files=0

The pglog table

  • \d pglog
  • Partitionned table
  • FDW or physical table
Image by Jeong Eun Lee from Pixabay

The pglog table

laetitia=# \d pglog.pglog
                 Partitioned table "pglog.pglog"
         Column         |            Type             
------------------------+-------------------------------------
 log_time               | timestamp(3) with time zone 
 user_name              | text                        
 database_name          | text                        
 session_id             | text                        
 session_line_num       | bigint                      
 session_start_time     | timestamp with time zone    
 error_severity         | text                        
 message                | text                        
 query                  | text                        
 query_id               | bigint                      
Image by Jeong Eun Lee from Pixabay

Creating the pglog table

  • For a single file
create_logtable_from_file(
                file,
                directory,
                version
                [, schema]
                [, materialized]);
Image by Jeong Eun Lee from Pixabay

Creating the pglog table

  • For all csv files in a directory
pglog.create_logtable_from_dir(directory,
                version
                [, schema]
                [, materialized]);
Image by Jeong Eun Lee from Pixabay

Reports

Image by Gerd Altmann from Pixabay

Error report

select * from error_report;
 error_severity |   nb    | percentage
----------------+---------+------------
 FATAL          |       8 |       0.00
 ERROR          |     106 |       0.00
 WARNING        |       3 |       0.00
 LOG            | 6055776 |     100.00
(4 rows)
Image by Ryan McGuire from Pixabay

Customizing eror report

select pglog.error_severity,
    count(*) as nb,
    round('100'::numeric * count(*)::numeric / sum(count(*)) over (), 2) as round
   from pglog.pglog
     join ( values (0,'PANIC'::text), (1,'FATAL'::text), (2,'ERROR'::text), (3,'WARNING'::text), (4,'LOG'::text), (5,'NOTICE'::text), (6,'INFO'::text), (7,'DEBUG'::text)) severity(level, name) on pglog.error_severity = severity.name
  where log_time between '2021-12-03' and '2021-12-04'
  and database_name='laetitia'
group by pglog.error_severity, severity.level
order by severity.level;
Image by Ryan McGuire from Pixabay

Detailed report for fatal messages

select * from error_analyze_fatal;
                       message                       | count |            min             |            max
-----------------------------------------------------+-------+----------------------------+----------------------------
 terminating connection due to administrator command |     7 | 2021-08-25 13:01:32.992+00 | 2021-08-27 09:12:44.884+00
 connection to client lost                           |     2 | 2021-08-25 12:20:24.044+00 | 2021-08-25 12:20:24.044+00
(2 rows) 
Image by Ryan McGuire from Pixabay

Autovacuum report

select * from autovacuum_report
order by vacuum_count desc fetch first 5 rows only;
 schema_name |  table_name   | vacuum_count | analyze_count
-------------+---------------+--------------+---------------
 pg_catalog  | pg_statistic  |           24 |             0
 pg_catalog  | pg_attribute  |           15 |            14
 pglog       | postgresqlwed |            8 |             7
 pg_catalog  | pg_class      |            7 |             7
 pg_catalog  | pg_type       |            6 |             6
(5 rows) 
Image by Ryan McGuire from Pixabay

Customizing autovacuum report

select *
from pglog.autovacuum_report
where schema_name='laetitia'
order by vacuum_count desc fetch first 5 rows only;
Image by Ryan McGuire from Pixabay

Checkpoint stats

select * from checkpoints_stats;
 avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration
---------------------+------------------------+-------------------------+----------------------------
 06:01:55.491981     | 00:15:00.809           | 00:00:32.447288         | 00:00:06.6675
(1 row) 
Image by Ryan McGuire from Pixabay

More details

pglog=# select * from checkpoints_stats_by_hour;
 date_part | avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration
-----------+---------------------+------------------------+-------------------------+----------------------------
         6 | 03:31:43.989333     | 00:04:59.941           | 00:00:06.375            | 00:00:04.431
         7 | 22:36:05.281917     | 00:15:00.809           | 00:00:17.00225          | 00:00:03.916
         8 | 00:21:22.5615       | 00:15:01.661           | 00:00:17.9445           | 00:00:09.3765
         9 | 00:39:21.674167     | 00:30:17.2415          | 00:00:33.430833         | 00:00:11.2735
        10 | 00:16:09.9175       | 00:16:09.9175          | 00:00:06.032            | 00:00:06.032
        11 | 00:37:25.941667     | 00:12:22.328           | 00:01:40.044667         | 00:02:29.37
        12 | 01:11:27.7846       | 00:40:00.957           | 00:01:05.7842           | 00:00:16.016
        13 | 00:15:19.55725      | 00:07:56.6295          | 00:01:18.24225          | 00:01:21.253
        15 | 04:13:58.037        | 04:13:58.037           | 00:02:30.109            | 00:02:30.109
        16 | 02:58:23.038333     | 00:25:00.427           | 00:00:05.165667         | 00:00:05.012
        17 | 00:15:00.34025      | 00:07:30.346           | 00:00:04.27425          | 00:00:03.195
        19 | 01:02:31.026        | 01:02:31.026           | 00:00:06.4115           | 00:00:06.4115
        20 | 00:09:59.441        | 00:09:59.441           | 00:00:06.241            | 00:00:06.241 
Image by Ryan McGuire from Pixabay

Temp files report

pglog=# select * from tempfile_report\gx
-[ RECORD 2 ]--+--------------
min            | 8192 kB
max            | 374 GB
global_average | 57 GB
10%            | 5720 MB
20%            | 5720 MB
30%            | 5720 MB
40%            | 6000 MB
50%            | 32 GB
60%            | 33 GB
70%            | 36 GB
80%            | 142 GB
90%            | 144 GB
100%           | 374 GB 
Image by Ryan McGuire from Pixabay

Temp file queries report

select * from tempfile_queries order by frequency
 fetch first 1 rows only\gx
-[ RECORD 1 ]--------------+------------------------------
frequency                  | 154
query_tempfilesize_median  | 6000 MB
query_tempfilesize_average | 30 GB
total_size                 | 4655 GB
query                      | call create_mylogs (?,true); 
Image by Ryan McGuire from Pixabay

Slow queries

pglog=# select *
from top_n_slowest_queries
order by "average" desc
fetch first 1 rows only;

-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query   | with queries as ( select split_part(message, ? ) ) select query, count ? , avg(duration) as average, percentile_disc ? within group (order by duration) as median from queries group by query order by average desc limit ?
count   | 1
average | 26384.688000000000
median  | 26384.688
Image by Ryan McGuire from Pixabay

Session stats

select * from sessions_stats ;

 min_duration | max_duration  |  avg_duration   | user_name | database_name 
--------------+---------------+-----------------+-----------+---------------
 00:00:00.033 | 00:41:05.38   | 00:05:32.636933 | postgres  | mybench
 00:02:41.311 | 156:33:45.673 | 43:09:49.903    | postgres  | pglog
 00:00:03.538 | 00:00:06.231  | 00:00:04.8845   | postgres  | postgres
 00:00:00.003 | 00:00:00.013  | 00:00:00.007333 | postgres  | template1
Image by Ryan McGuire from Pixabay

Conections report

select * from connections_report ;

 user_name | database_name |        frequency        | connection_number 
-----------+---------------+-------------------------+-------------------
 postgres  | mybench       | 00:23:57.488429         |                14
 postgres  | pglog         | 10 days 12:05:08.038667 |                 3
 postgres  | postgres      | 00:04:20.521            |                 1
 postgres  | template1     | 00:00:00.519            |                 2
Image by Ryan McGuire from Pixabay

What really matters

  • Your own system is unique
  • with its own unique problems.
  • Craft your own queries!
Image by Ryan McGuire from Pixabay

Want to play with it?

https://gitlab.com/l_avrot/pglog

Questions?

Image by Tumisu from Pixabay